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Abstract 

Online variants of the Expectation Maximization (EM) algorithm have 
recently been proposed to perform parameter inference with large data 
sets or data streams, in independent latent models and in hidden Markov 
models. Nevertheless, the convergence properties of these algorithms re- 
main an open problem at least in the hidden Markov case. This contribu- 
tion deals with a new online EM algorithm which updates the parameter 
at some deterministic times. Some convergence results have been de- 
rived even in general latent models such as hidden Markov models. These 
properties rely on the assumption that some intermediate quantities are 
available in closed form or can be approximated by Monte Carlo meth- 
ods when the Monte Carlo error vanishes rapidly enough. In this paper, 
we propose an algorithm which approximates these quantities using Se- 
quential Monte Carlo methods. The convergence of this algorithm and 
of an averaged version is established and their performance is illustrated 
through Monte Carlo experiments. 



This extended version of the paper "Convergence of a Particle-based Ap- 
proximation of the Block Online Expectation Maximization Algorithm", by S. 
Le CorfF and G. Fort, provides detailed proofs which have been omitted in the 
submitted paper since they are very close to existing results. These additional 
proofs are postponed to Appendix [B] 
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1 Introduction 



The Expectation Maximization (EM) algorithm is a well-known iterative algo- 
rithm to solve maximum likelihood estimation in incomplete data models, see 
[1 4) . Each iteration is decomposed into two steps: in the E-step the conditional 
expectation of the complete log-likelihood (log of the joint distribution of the 
hidden states and the observations) given the observations is computed; and the 
M-step updates the parameter estimate. The EM algorithm is mostly practica- 
ble if the model belongs to the curved exponential family, see PHI Section 1.5] 
and [6*, Section 10.1], so that we assume below that our model belongs to this 
family. Under mild regularity conditions, this algorithm is known to converge 
to the stationary points of the log-likelihood of the observations, see [3^. How- 
ever, the original EM algorithm cannot be used to perform online estimation 
or when the inference task relies on large data sets. Each iteration requires the 
whole data set and each piece of data needs to be stored and scanned to pro- 
duce a new parameter estimate. Online variants of the EM algorithm were first 
proposed for independent and identically distributed (i.i.d.) observations: [S] 
proposed to replace the original E-step by a stochastic approximation using the 
new observation. Solutions have also been proposed in hidden Markov models 
(HMM): [4_ provides an algorithm for finite state-space HMM which relies on 
recursive computations of the filtering distributions combined with a stochas- 
tic approximation step. Note that, since the state-space is finite, deterministic 
approximations of these distributions are available. This algorithm has been 
extended to the case of general state-space models, the approximations of the 
filtering distributions being handled with Sequential Monte Carlo (SMC) algo- 
rithms, see [10, and [25,. Unfortunately, it is quite challenging to address 
the asymptotic behavior of these algorithms (in the HMM case) since the recur- 
sive computation of the filtering distributions relies on approximations which 
are really difficult to control. 

In [53], another online variant of the EM algorithm in HMM is proposed, 
called the Block Online EM (BOEM) algorithm. In this case, the data stream 
is decomposed into blocks of increasing sizes. Within each block, the parameter 
estimate is kept fixed and the update occurs at the end of the block. This 
update is based on a single scan of the observations, so that it is not required to 
store any block of observations. |23) provides results on the convergence and on 
the convergence rates of the BOEM algorithms. These analyses are established 
when the E-step (computed on each block) is available in closed form and when 
it can be approximated using Monte Carlo methods, under an assumption on 
the Lp-error of the Monte Carlo approximation. 

In this paper, we consider the case when the E-step of the BOEM algorithm 
is computed with SMC approximations: the filtering distributions are approxi- 
mated using a set of random weighted particles, see [6j and \S\. The Monte Carlo 
approximation is based on an online variant of the Forward Filtering Backward 
Smoothing algorithm (FFBS) proposed in [4^ and [10^ . This method is appeal- 
ing for two reasons: first, it can be implemented forwards in time i.e. within a 
block, each observation is scanned once and never stored and the approximation 
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computed on each block does not require a backward step - this is crucial in our 
online estimation framework. Secondly, recent work on SMC approximations 
provides Lp-mean control of the Monte Carlo error, see e.g. [19] and [9]. This 
control, combined with the results in [23] , sparks off the convergence results and 
the convergence rates provided in this contribution. 

The paper is organized as follows: our new algorithm called the Particle 
Block Online i?M algorithm (P-BOEM) is derived in Section [2] together with an 
averaged version. Section [3] is devoted to practical applications: the P-BOEM 
algorithm is used to perform parameter inference in stochastic volatility models 
and in the more challenging framework of the Simultaneous Localization And 
Mapping problem (SLAM). The convergence properties and the convergence 
rates of the P-BOEM algorithms are given in Section ]4] 



2 The Particle Block Online EM algorithms 



In Section 12. 1| we fix notation that will be used throughout this paper. We 
then derive our online algorithms in Sections 12.2] and 12.3) We finally detail. 



in Section 2.4 the SMC procedure that makes our algorithm a true online 
algorithm. 



2.1 Notations and Model assumptions 

A hidden Markov model on X x Y is defined by an initial distribution x on (X, X) 
and two families of transition kernels. In this paper, the transition kernels are 
parametrized hy 9 E Q, where 6 C M.'^" is a compact set. In the sequel, the ini- 
tial distribution x on (X, X) is assumed to be known and fixed. The parameter 
is estimated online in the maximum likelihood sense using a sequence of ob- 
servations Y. Online maximum likelihood parameter inference algorithms were 
proposed either with a gradient approach or an EM approach. In the case of 
finite state-spaces HMM, |,26j proposed a recursive maximum likelihood proce- 
dure. The asymptotic properties of this algorithm have recently been addressed 
in [31]. This algorithm has been adapted to general state-spaces HMM with 
SMC methods (see ^ISj ) . The main drawback of gradient methods is the neces- 
sity to scale the gradient components. As an alternative to performing online 
inference in HMM, online EM based algorithms have been proposed for finite 
state-spaces (see [4j) or general state-spaces HMM (see [3], [lOj and [23]). [TU] 
proposed a SMC method giving encouraging experimental results. Nevertheless, 
it relies on a combination of stochastic approximations and SMC computations 
so that its analysis is quite challenging. In [23 , the convergence of an online 
EM based algorithm is established. This algorithm requires either the exact 
computation of intermediate quantities (available explicitly only in finite state- 
spaces HMM or in linear Gaussian models) or the use of Monte Carlo methods 
to approximate these quantities. We propose to apply this algorithm to general 
models where these quantities are replaced by SMC approximations. We prove 
that the Monte Carlo error is controlled in such a way that the convergence 
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properties of [23] hold for the P-BOEM algorithms. 

We now detail the model assumptions. Consider a family of transition kernels 
{ing{x, a;')dA(a;')}6ige on X x A", where X is a general state-space equipped with 
a countably generated cr-field X, and A is a finite measure on (X, A"). Let 
{gg{x,y)di'{y)}gizQ be a family of transition kernels on X x 3^, where Y is a 
general space endowed with a countably generated cr-field y and ;/ is a measure 
on (Y,3^). Let Y = {Yt}t^z be the observation process defined on (f2,P, J^) and 
taking values in Y^. The batch EM algorithm is an offline maximum likelihood 
procedure which iteratively produces parameter estimates using the complete 
data log-likelihood (log of the joint distribution of the observations and the 
states) and a fixed set of observations, see ^14^. In the HMM context presented 
above, given T observations Yi-t, the missing data xq-t and a parameter 9, the 
complete data log-likelihood may be written as (up to the initial distribution x 
which is assumed to be known) 

T 

4(a;o:T, Yi:t) = ^{log'mg{xt-i,Xt) +\oggo{xt,Yt)} , (1) 
t=i 

where we use Xr-.t as a shorthand notation for the sequence (xr, ■ ■ ■ ,Xt), r < t. 
Each iteration of the batch EM algorithm is decomposed into two steps. The E- 
step computes, for all 6* e O, an expectation of the complete data log-likelihood 
under the conditional probability of the hidden states given the observations 
and the current parameter estimate 9. In the HMM context, due to the additive 
form of the complete data log-likelihood ([T|), the E-step is decomposed into T 
expectations under the conditional probabilities ^^'^rpi'iY) where 

^X:'- dot / X(da:r){n*^r nie{xi, Xi+i)ge{xi+i,yi+i)} h{xs^i,Xs, Ys) dX{xr+i:t) 

' J x{'ixr){Yl*iZlmg{xi,Xi+i)ge{x,+i,yi+i)}dX{xr+i:t) 

(2) 

for any bounded function h, any 9 Cz Q, any r < s < t and any sequence 
y e Y^. Then, given the current value of the parameter 9, the E-step amounts 
to computing the quantity 

1 ^ 

Qt {9, 9) - ^ $1;°^ (log me + log 5. , Y) , (3) 

t=i 

for any 6* G 8. The M-step sets the new parameter estimate as a maximum of 
this expectation over 9. 

The computation of i-> Qt{9, 9) for any ^? G 8 is usually intractable except 
in the case of complete data likelihood belonging to the curved exponential 
family, see [29. Section 1.5] and [6, Section 10.1]. Therefore, in the sequel, the 
following assumption is assumed to hold: 

Al (a) There exist continuous functions : 8 — > M, 1/; : 8 M'^ and 
5 : X X X X Y ^ M'^ s.t. 

logmg{x,x ) +\oggg(x',y) = (f>{9) + {S{x,x,' ,y),tl;{9)) , 
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where (•, •) denotes the scalar product on M.^. 

(b) There exists an open subset S of R"^ that contains the convex hull of 
^(X X X X Y). 

(c) There exists a continuous function 6 : S Q s.t. for any s e 5, 

^(s) = argmaxgge {^{9) + {s,^{e))} . 

Under the quantity QT{d,9) defined by (|3| becomes 

QTiej)^m + (^^p^^ll^is,Y)^m^ , (4) 

so that the definition of the function 6 H- Qt{0, requires the computation of 
an expectation ^ J2t=i ^^tr ^^^^^ independently of 0. 

The M-step of the batch EM iteration amounts to computing 

This batch EM algorithm is designed for a fixed set of observations. A 
natural extension of this algorithm to the online context is to define a sequence 
of parameter estimates by 

6*4+1 = argmaxg Qf+i{9,9t) . 

Unfortunately, the computation of Qt+i{0,dt) requires the whole set of obser- 
vations to be stored and scanned for each estimation. For large data sets the 
computation cost of the E-step makes it intractable in this case. To overcome 
this difficulty, several online variants of the batch EM algorithm have been pro- 
posed, based on a recursive approximation of the function 9 i— ^ Qt+i(',^t) (see 
0], [TD] and [23]). In this paper, we focus on the Block Onhne EM (BOEM) 
algorithm, see j23j. 

2.2 Particle Block Online EM (P-BOEM) 

The BOEM algorithm, introduced in [S^, is an online variant of the EM algo- 
rithm. The observations are processed sequentially per block and the parameter 
estimate is updated at the end of each block. Let {Tk}k>i be a sequence of pos- 
itive integers denoting the length of the blocks and set 

n 

Tn ^^Tk and Tq = ; (5) 

fc=i 

{Tk}k>i are the deterministic times at which the parameter updates occur. 
Define, for all integers r > and T > and all e 6, 

5X,T(^^Y)<^=i^l J2 '^fJ,T+riS,Y) . (6) 
t=T+l 
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The quantity S^'^{9,Y) corresponds to the intermediate quantity in Q with 
the observations Yt+i-.t+t- 

The BOEM algorithm iteratively defines a sequence of parameter estimates 
{^ji}ri>o follows: given the current parameter estimate 

(i) compute the quantity Sf^^'^{dn,Y), 

(ii) compute a candidate for the new value of the parameter: 9n+i = (^S^^^" {9m Y) 

To make the exposition easier, we assume that the initial distribution x is the 
same on each block. The dependence of S^''^{6,Y) on x is thus dropped from 
the notation for better clarity. 

The quantity S'^^^_^{9n,Y) is available in closed form only in the case of 
linear Gaussian models and HMM with finite state-spaces. In HMM with gen- 
eral state-spaces S'^"^^ {9m Y) cannot be computed explicitly and we propose to 
compute an approximation of S'^^^_^{9mY) using SMC algorithms thus yielding 
the Particle-BOEM (P-BOEM) algorithm. Different methods can be used to 
compute these approximations (see e.g. [5], [ID] and [in])- We will discuss in 
Section |2.4| below some SMC approximations that use the data sequentially. 

Denote by Sn{9, Y) the SMC approximation of §1^^^-^ (9, Y) computed with 
A^„_i_i particles. The P-BOEM algorithm iteratively defines a sequence 
of parameter estimates {6'„}„>o as follows: given the current parameter 
estimate 

(i) compute the quantity S'„(0„, Y), 

(ii) compute a candidate for the new value of the parameter: 

9n+l — 9 (^Sn{9n,Y)j . 

We give in Algorithm [T] lines 1 to 9 an algorithmic description of the P-BOEM 
algorithm. Note that the idea of processing the observations by blocks is pro- 
posed in |30j to fit a normal mixture model. The incremental EM algorithm 
discussed in f3U' is an alternative to the batch EM algorithm for very large data 
sets. Contrary to our framework, in the algorithm proposed by [30J, the number 
of observations is fixed and the same observations are scanned several times. 

2.3 Averaged Particle Block Online EM 

Following the same lines as in [53], we propose to replace the P-BOEM se- 
quence {9„}n>o by an averaged sequence. This new sequence can be computed 
recursively, simultaneously with the P-BOEM sequence, and does not require 
additional storage of the data. The proposed averaged P-BOEM algorithm is 
defined as follows (see also lines 5 and 6 of Algorithm [T|) : the step ^ of the 
P-BOEM algorithm presented above is followed by 
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(iv) compute the quantity 

Sra+l — " S„ + "^"^ Sn{Om Y) , (7) 

(v) define 

^'n+l = d i^n+ij ■ (8) 

We set So = so that 

S„ = ;^f]r, Vl(e,-l,Y); (9) 

we will prove in Section [4^ that the rate of converjence of the averaged sequence 
{^n}n>Oi computed from the averaged statistics {S„}„>o, is better than the non- 
averaged one. We will also observe this property in Section [3] by comparing the 
variability of the P-BOEM and the averaged P-BOEM sequences in numerical 
applications. 



Algorithm 1 P-BOEM and averaged P-BOEM 
Require: 6*0, {t„}„>i, {JV„}„>i, {Yt}t>a ■ 
Ensure: {6'„}„>o and {6'„}„>o . 

Set Eo = 0. 

for alH > do 

Compute sequentially Si{9i,Y) . 

Set =0(5,(0,, Y)) . 

Set ^ 

Set e,+i = 9 . 
end for 



2.4 The SMC approximation step 

As the P-BOEM algorithm is an online algorithm, the SMC algorithm should 
use the data sequentially: no backward pass is allowed to browse all the data at 
the end of the block. Hence, the approximation is computed recursively within 
each block, each observation being used once and never stored. These SMC 
algorithms will be referred to as forward onlj£ SMC. We detail below a forward 
only SMC algorithm for the computation of S'„(0„, Y) which has been proposed 
by m (see also [10]). 

For notational convenience, the dependence on n is omitted. For block n, 
the algorithm below has to be applied with {t,N) ^ (t„+i,A„+i), Yi-r 
Yt„+i,t„+t„+i and ^ 6^- 
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The key property is to observe that 

5°(0,Y) = 0f(i?e,r) (10) 

where (f)^ is the filtering distribution at time t, and the functions Rt g : X — > S, 
1 < t < T, satisfy the equations 

RtAx)=^-^1{x.S{-,x,Yt)) + ^-^Bl{x,Rt-ifi) , (11) 

where denotes the backward smoothing kernel at time t 

B?(x,dx') = 7K^J\a ■ (12) 

J ■me{u,x)(p1_-^{au) 



By convention, RQ^g{x) — and (pQ = x- ^ proof of the equalities (|10| to (12 1 
can be found in and [T^. Therefore, a careful reading of Eqs ( |10[ ) to (12 1 
shows that, for an iterative particle approximation of S^{9,'Y), it is sufficient 
to update from time t — 1 to t 

(i) N weighted samples {(^j, wf) ;£ g {1, . . . ,-/V}} used to approximate the 
filtering distribution (j)^. 

(ii) the intermediate quantities approximating the function Ri g at 
point X = il, £ e {1, • • • ,iV}. 

We describe below such an algorithm. An algorithmic description is also pro- 
vided in Appendix [A] Algorithm |2] 

Given instrumental Markov transition kernels {qt{x,x'),t < ''■} on X x A" 
and adjustment multipliers {vt^t < r}, the procedure goes as follows: 

(i) line 1 in Algorithm^ sample independently N particles {^olfci with the 
same distribution 



(ii) line 6 in Algorithm^ at each time step t € {1, . . . , r}, pairs {( J/, S,t)}eLi 
of indices and particles are sampled independently (conditionally to Yi-t, 
6 and {(>//_i, Ct-i)}fci) from the instrumental distribution: 

ttSAx) oc a;j_iUt(^j_i)gf(Cj„i,a;)A(da;) , (13) 

on the product space {1, . . . , N} x X. For any t G {1, . . . , r} and any 
£ G {1, . . . , N}, jf denotes the index of the selected particle at time t — 1 
used to produce ^f. 

(iii) line 7 in Algorithm^ once the new particles {£,f}f^i have been sampled, 
their importance weights {wf}^]^ are computed. 



N 



(iv) lines 8 in Algorithm^ update the intermediate quantities {Rf g} 

If, for all a; e X, Vt{x) — 1 and if the kernels qt are chosen such that qt — mg, 
lines 6-7 in Algorithm |2] are known as the Bootstrap filter. Other choices of qt 
and vt can be made, see e.g. [6]. 
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3 Applications to Bayesian inverse problems in 
Hidden Markov Models 



3.1 Stochastic volatility model 

Consider the following stochastic volatility model: 

Xt+i = (bXt + aUt , Yt = l3e^Vt, 

where Xq ~ Af (O, (1 — (/)^)~^(T^) and {Ut}t>o and {Vt}t>o are two sequences of 
i.i.d. standard Gaussian r.v., independent from Xq. 

We illustrate the convergence of the P-BOEM algorithms and discuss the 
choice of some design parameters such as the pair (r„, A'n). Data are sampled 
using (j) = 0.95, = 0.1 and = 0.6; we estimate 9 = (0, cr^, (3^) by applying 
the P-BOEM algorithm and its averaged version. All runs are started from 
(j, = 0.1, cr2 ^ 0.6 and = 2. 

Figure [l] displays the estimation of the three parameters as a function of the 
number of observations, over 50 independent Monte Carlo runs. The block-size 
sequence is of the form t„ oc n^-^. For the SMC step, we choose Nn = 0.25 • t„; 
particles are sampled as described in Algorithm [2] (see Appendix |A| with the 
bootstrap filter. For each parameter. Figure [T] displays the empirical median 
(bold line) and upper and lower quartiles (dotted line). The averaging procedure 
is started after 1500 observations. Both algorithms converge to the true values 
of the parameters and, once the averaging procedure is started, the variance 
of the estimation decreases (estimation of (j) and (3'^). The estimation of cr^ 
shows that, if the averaging procedure is started with too few observations, the 
estimation can be slowed down. 

We now discuss the role of the pairs (t„, Nn). Roughly speaking (see sectionjl] 
for a rigorous decomposition), t controls the rate of convergence of 5^(6*, Y) 
to limT-^oo '5^(^?, Y); and N controls the error between S'^{d,Y) and its SMC 
approximation. We will show in Section [4] that lim„ t„ = lim„ Nn = +oo 
are part of some sufficient conditions for the P-BOEM algorithms to converge. 
We thus choose increasing sequences {t„, A'^,i}„>i. The role of t„ has been 
illustrated in [23, Section 3]. Hence, in this illustration, we fix t„ and discuss 
the role of Nn. Figure |2] compares the algorithms when applied with t„ oc n^'^ 
and Nn = \Jt\i or Nn = t„. The empirical variance (over 50 independent 
Monte Carlo runs) of the estimation of is displayed, as a function of the 
number of blocks. First, Figure [2] illustrates the variance decrease provided by 
the averaged procedure, whatever the block size sequence. Moreover, increasing 
the number of particles per block improves the variance of the estimation given 
by the P-BOEM algorithm while the impact on the variance of the averaged 
estimation is less important. On average, the variance is reduced by a factor of 
3.0 for the P-BOEM algorithm and by a factor of 1.8 for its averaged version 
when the number of particles goes from Nn = to Nn — Tn . These practical 
considerations illustrate the theoretical results derived in Section l44l 

Finally, we discuss the role of the initial distribution x- ^ the applications 
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Number o1 observation: 



(a) Estimation of 0. 



Number of observations 

(b) Estimation of </>. 




Number of observatio 



(c) Estimation of cr^ 




Number of observations 



(d) Estimation of cr^ 




Number of observations 



(e) Estimation of 



Number of observations 



(f ) Estimation of 



Figure 1 : Estimation of 4>, and without (left) and with (right) averaging. 
Each graph represents the empirical median (bold line) and upper and lower 
quartiles (dotted line) over 50 independent Monte Carlo runs. The averaging 
procedure is started after 1500 observations. The first 1000 observations are not 
displayed for better clarity. 



above, we have the same distribution x = A/" (O, (1 — 0^)^^(t^) at the beginning 
of each block. We could choose a different distribution Xn for each block such 
as, e.g., the filtering distribution at the end of the previous block. We have 
observed that this particular choice of Xn leads to the same behavior for both 
algorithms. 

To end this section, the P-BOEM algorithm is compared to the Online EM 
algorithm outlined in [4J and [TD]. These algorithms rely on a combination of 
stochastic approximation and SMC methods. According to classical results on 
stochastic approximation, it is expected that the rate of convergence of the 
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(a) P-BOEM: empirical variance of the estimation of with Nn 
^Jt^ (dashed Une) and Af„ = r„ (bold Une). 
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(b) Averaged P-BOEM: empirical variance of the estimation of 0^ 
with Nn = s/tWi (dashed line) and Nn = t„ (bold line). 

Figure 2: Empirical variance of the estimation of with the P-BOEM al- 
gorithm (top) and its averaged version (bottom). The averaging procedure is 
started after the 25-th block and the variance is displayed after a burn-in time 
of 35 blocks. 



1 /2 

Online EM algorithm behaves like 7,/ , where {7„}„>o is the so called step-size 
sequence. Hence, 7„ in the Online EM algorithm is chosen such that 7„ cx n"*^'^^ 
and the block-size sequence in the P-BOEM algorithm such that r„ cx v} '^ . 
The number of particles used in the Online EM algorithm is fixed and chosen 
so that the computational costs of both algorithms are similar. Provided that 
Nn oc Tn in the P-BOEM algorithm, this leads to a choice of 70 particles for the 
Online EM algorithm. We report in Figure [sj the estimation of (j> and for 
a Polyak-Ruppert averaged Online EM algorithm (see [33J) and the averaged 
P-BOEM algorithm as a function of the number of observations. The averaging 
procedure is started after about 1500 observations. As noted in [27, Section 3] 
for a constant sequence {A'^„}.,i>o this figure shows that both algorithms behave 
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similarly. For the estimation of (j) and /3^, the variance is smaller for the P- 
BOEM algorithm and the convergence is faster for the P-BOEM algorithm in 
the case of 0^ . Conclusions are different for the estimation of ct^ : the variance 
is smaller for the P-BOEM algorithm but the Online EM algorithm converges 
a bit faster. The main advantage of the P-BOEM algorithm is that it relies on 
approximations which can be controlled in such a way that we are able to show 
that the limiting points of the P-BOEM algorithms are the stationary points of 
the limiting normalized log-likelihood of the observations. 



3.2 Simultaneous Localization And Mapping 

The Simultaneous Localization And Mapping (SLAM) problem arises when a 
mobile device wants to build a map of an unknown environment and, at the same 
time, has to estimate its position in this map. The common statistical approach 
for the SLAM problem is to introduce a state-space model. Many solutions 
have been proposed depending on the assumptions made on the transition and 
observation models, and on the map (see e.g. [28 and [32 ). In [28J and [25] . 
it is proposed to see the SLAM as an inference problem in HMM: the localization 
of the robot is the hidden state with Markovian dynamic, and the map is seen as 
an unknown parameter. Therefore, the mapping problem is answered by solving 
the inference task, and the localization problem is answered by approximating 
the conditional distribution of the hidden states given the observations. 

In this application, we consider a statistical model for a landmark-based 
SLAM problem for a bicycle manoeuvring on a plane surface. 

Let Xt '= {a;t,i}f=i be the robot position, where Xt^i and Xt^2 are the robot's 
cartesian coordinates and Xt,2. its orientation. At each time step, deterministic 
controls are sent to the robot so that it explores a given part of the environment. 
Controls are denoted by {vt^ tpt) where ipt stands for the robot's heading direction 
and Vt its velocity. The robot position at time t, given its previous position at 
time t — 1 and the noisy controls {vt,ipt), can be written as 

Xt = f{xt-i,vt,i^t) , (14) 

where (wtji/^t) is a 2-dimensional Gaussian distribution with mean (vt,ipt) and 
known covariance matrix Q. In this contribution we use the kinematic model of 



the front wheel of a bicycle (see e.g. [I]) where the function / in (14 1 is given 

by 

(Vtdt cos(a;(_i,3 + li^tY 
Vtdt sin(a;f_i^3 + -0t) 
VtdtB^^ sin(i/)t) 

where dt is the time period between two successive positions and B is the robot 
wheelbase. 

The 2-dimensional environment is represented by a set of landmarks 9 = 
{dj}i<j<q, Oj E C being the position of the j— th landmark. The total number 
of landmarks q and the association between observations and landmarks are 
assumed to be known. 
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Figure 3: Estimation of cr^ and with the averaged P-BOEM algorithm 
(left) and a Polyak-Ruppert averaged version of the Online EM algorithm (right) 
after 300, 1500, 5000, 10000, 20000 and 45000 observations. The averaging pro- 
cedure is started after about 1000 observations (which corresponds to the 25-th 
block for the P-BOEM algorithm). 

At time t, the robot observes the distance and the angular position of all 
landmarks in its neighborhood; let Ct C {1, • • • , g} be the set of observed land- 
marks at time t. It is assumed that the observations {yt,i}i<£ct £^re independent 
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and satisfy 

Vt,i = h{xt,Bi) + 5t,i , 

where h is defined by 

dof ( - XiY + {1^2 - X2Y 



^(^''^^ - { arctan^-X3 

and the noise vectors {5t^i]t,i are i.i.d Gaussian J\f{0,R). R is assumed to be 
known. 

The model presented in this Section does not take into account all the issues 
arising in the SLAM problem (such as the association process which is assumed 
to be known and the known covariance matrices). The aim is to prove that 
the BOEM algorithm and its averaged version have satisfying behavior even 
in the challenging framework described above. The observation and motion 
models are highly nonlinear and we show that the BOEM algorithm remains 
stable in this experiment. Several solutions have been proposed to solve the 
association problem (see e.g. [2J for a solution based on the likelihood of the 
observations) and could be adapted to our case. We want to estimate 9 = 
by applying the P-BOEM algorithms. In this paper, we use simulated 
data, q = 15 landmarks are drawn in a square of size 45TOx45m. The robot 
path is sampled with a given set of controls. Using the true positions of all 
landmarks in the map and the true path of the robot (see the dots and the 



bold line on Figure 



4|, observations are sampled by setting: R — ["j^ ^2 



where ar = 0.5m. — ^rad and p — 0.01. We choose Q — diag(cr^, cr|) where 

(jy = 0.5m.s~^, CT^ = (i)'^^^ ^^'^ ^ ~ l-Sm. 

In this model, the transition denoted by mg does not depend on the map 6 



(see (14)) and the marginal likelihood gg is such that the complete data likeli- 



hood does not belong to the curved exponential family: 



'y_^\TLgg{xt,yt,i) oi^^[yt,i-h{xt,9i)]* R ^ [yt,i ~ h{xt,9i)] . (15) 

Get iect 



Hence, in order to apply Algorithm [T] at the beginning of each block, gg is 
approximated by a function depending on the current parameter estimate so 
that the resulting approximated model belongs to the curved exponential family 



(see |25]V As can be seen from ( 15 ), approximating the function k i— > h{x, n) by 
its first-order Taylor expansion at 9i leads to a quadratic approximation of gg. 
This approach is commonly used in the SLAM framework to use the properties 
of linear Gaussian models (see e.g. [2]). 

As the landmarks are not observed all the time, we choose a slowly increasing 
sequence {t„ cx n^'^}n>i so that the number of updates is not too small (in this 
experiment, we have 60 updates for a total number of observations of 2000). As 
the total number of observations is not so large (the largest block is of length 
60) , the number of particles is chosen to be constant on each block: for all ri > 1, 
Nn — 50. For the SMC step, we apply Algorithm [2] with the bootstrap filter. 
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For each run the estimated path (equal to the weighted mean of the particles) 
and the estimated map at the end of the loop (T — 2000) are stored. Figure |4] 
represents the mean estimated path and the mean map over 50 independent 
Monte Carlo runs. It highlights the good performance of the P-BOEM algorithm 
in a more complex framework. 




Figure 4: True trajectory (bold line) and true landmark positions (balls) with 
the estimated path (dotted line) and the landmarks' estimated positions (stars) 
at the end of the run (T = 2000). 

We also compare our algorithm to the marginal SLAM algorithm proposed 
by [28J. In this algorithm, the map is also modeled as a parameter to learn in 
a HMM model; SMC methods are used to estimate the map in the maximum 
likelihood sense. The Marginal SLAM algorithm is a gradient-based approach 
for solving the recursive maximum likelihood procedure. Note that, in the case 
of i.i.d. observations, ^35j proposed to update the parameter estimate each time 
a new observation is available using a stochastic gradient approach. Figure |5] 
illustrates the estimation of the position of each landmark. The P-BOEM algo- 
rithm is applied using the same parameters as above and the marginal SLAM 
algorithm uses a sequence of step-size {7„ oc 7t,~''-^}„>i. We use the averaged 
version of the P-BOEM algorithm and a Polyak-Ruppert based averaging pro- 
cedure for the marginal 5'LylM algorithm (see [33J). For each landmark the last 
estimation (at the end of the loop) of the position is stored for each of the 50 
independent Monte Carlo runs. Figure [5] displays the distance between the esti- 
mated position and the true position for each landmark. In this experiment, the 
P-BOEM based SLAM algorithm outperforms the marginal SLAM algorithm. 

4 Convergence of the Particle Block Online EM 
algorithms 

In this section, we analyze the limiting points of the P-BOEM algorithm. We 
prove in Theorem |4.3| that the P-BOEM algorithm has the same limit points 
as a so-called limiting EM algorithm, which defines a sequence {On}n>Q by 
6l„+i = e[S{9n)] where S{9) is the a.s. limit lim^^+oo 5^(6', Y) (defined by 
([g])). As discussed in [23", Section 4.3.], the set of limit points of the limiting EM 
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Figure 5: Distance between the final estimation and the true position for each 
of the 15 landmarks with the averaged marginal SLAM algorithm (left) and the 
averaged P-BOEM algorithm (right). 

algorithm is the set of stationary points of the contrast function £{9), defined 
as the a.s. limit of the normalized log-likelihood of the observations, when 
T — >■ +00. The convergence result below on the P-BOEM algorithm requires 
two sets of assumptions: conditions A|2]to A[5]are the same as in 123] and imply 
the convergence of the BOEM algorithm; assumptions A|6]and A[7]are introduced 
to control the Monte Carlo error. 

4.1 Assumptions 

Consider the following assumptions 

A2 There exist cr_ and o'_|_ s.t. for any {x,x') £ and any 6 G Q, < < 
mg{x, x') < (7+. Set p =^ 1 — ((T_/o'+) . 

Define, for all y G Y, 

^-(y)='inf I 9e{x,y)\{Ax) and 6+(y) =^ sup / ge(a;, y)A(da;) . (16) 

For any sequence of r.v. Z =^ {Zt\t& on (f2,P, J^), let 

J-f ='a({Z„}„<fc) and a ({Zj„>fe) (17) 

be (T-fields associated to Z. We also define the mixing coefficients by, see [7], 

P^{n) sup/?(gf+„, J-f ) ,V n > , (18) 

tiGZ 

where for any tr-algebras F and Q, 

Pig,T) sup \V{B\T)-nB)\ . (19) 
Beg 
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For p > and Z a M'^-valued random variable measurable w.r.t. the a-algebra 
J", set 



A3-(p) ||sup^,,,gx2 \S{x,x',Yo)\\\^ < +00 . 

A4 (a) Y is a stationary sequence such that there exist C G [0, 1) and /3 € 
(0,1) satisfying, for any n > 0, (3^ [n) < C/3", where f3^ is defined 



in ( 18 1 



(b) E[|log5_(Yo)| + |log6+(Yo)|] 

A5 There exist c > and a > 1 such that for all n > 1, t„ = [cn°J . 

Assumptions i^to Ajsjare the same as in ^23j. A[2] referred to as the strong 
mixing condition, is used to prove the uniform forgetting property of the initial 
condition of the filter, see e.g. [TT] and [T^]. This assumption is easy to check 
in finite state-space HMM or when the state-space is compact when the Markov 
kernel mg is sufficiently regular. As noted in it can fail to hold in quite 
general situations. Nevertheless, the exponential forgetting property needed 
to ensure the convergence results could be checked under weaker assumptions 
(see |15] for a Doeblin assumption). However, it would imply quite technical 
supplementary results out of the scope of this paper. Examples of observation 
sequences satisfying A|4] include, for example, stationary '0-irreducible and pos- 
itive recurrent Markov chains which are geometrically ergodic (see e.g. for 
Markov chains theory). 

We need to control the Lp-mean error on each block between S'^^^(6'„ , Y) 



and its SMC approximation. This control is discussed in Section [4 . 2 1 below when 
the SMC approximation is computed as described in Section [2^4] 

4.2 Lp-error of the SMC approximation 

For each block n, denote by {vt.n}t<T„+i ^-nd {(j't,n}t<T„+i respectively the ad- 
justment multipliers and the instrumental kernels in the SMC propagation step 



(see (13)). For all y e Y, define 

me{x,x')ge{x',y) 
^+(y) = sup sup — 

t>0,n>0 

Consider the following assumptions. 

A6 \v\oo =^ SUpj „ \vt.n\oo < 00. 



A7-ip) 



f'-(Yo) 



< +00 . 

P 
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In the case of the Bootstrap filter, A|6] holds (since vt^n = 1) and uj^{y) = 
sup sup geix,y). 

Proposition 4.1. Let S* : x Y — > R'^ be a measurable function s.t. J^(p) 

holds for some p > 2. Assume ^^j^j Define Ap =^ 2pp/{p — p) and 

assume ^(Ap) holds for some p g (2,p). Then, there exists a constant C s.t. 
for all n > 0, 



< C 



Nn+i r'/J.N'i^, 

^ n+l n+1 , 



where Y) is computed with the algorithm described in Section 2.4 



4.3 Asymptotic behavior of the Particle Block Online EM 
algorithms 

Following |i23j, we address the convergence of the P-BOEM algorithm as the 
convergence of a perturbed version of the limiting EM recursion. The following 
result, which is proved in Theorem 4.1.], shows that when r is large, the 
BOEM statistic 5^(6', Y) is an approximation of a deterministic quantity S{9); 
the limiting EM algorithm is the iterative procedure defined by 9n+i = R(0n) 
where 

R(6i) =^^(S(6')),V6iee; (20) 
the mapping 9 is given by A[i] 

Theorem 4.2. Let S : X'^ x Y — > M'* be a measurable function s.t. j^(1) 
holds. Assume and J^^^. For any 9 £ Q, there exists a F-integrable r.v. 
E, [5(X_i,Xo, Yo)|Y] s.t. for any T > 0, 

S^{9,Y) ^ S(0)''=^'e[E,[5(X_i,Xo,Yo)|Y]] , P - a.s. (21) 

Moreover, 9 S{9) is continuous on O. 

The asymptotic behavior of the limiting EM algorithm is addressed in |23l 
Section 4.2]: the main ingredient is that the map R admits a positive and 
continuous Lyapunov function W w.r.t. the set 

£ {9 e 6; R{9) = 9} , (22) 

i.e. (i) W o R{9) > W{9) for any 9 E Q and, (ii) for any compact subset K, of 
e\£, infeeyc WoR(6l)-W(6l) > 0. This Lyapunov function is equal to exp(^(6')), 
where the contrast function £{9) is the (deterministic) limit of the normalized 
log- likelihood of the observations when T — >■ +00 (see |24( Theorem 4.9]). 

Theorem |4.3| establishes the convergence of the P-BOEM algorithm to the 



set C defined by (22). The proof of Theorem 4.3 is an application of |23l The- 



orem 4.4]. An additional assumption on the number of particles per block is 



required to check |23l A6] (note indeed that below and Proposition 4.1 imply 
the condition in [23; about the Lp-control of the error). 
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A8 There exist c > and d> {a + l)/2a (where a is given by such that, 
for all n > 1, iV„ = [cr^J . 

Theorem 4.3. Assume 4^(p), J^H^ and J^for some p > 2. Define 

Ap'= 2pp/(p—p) and assume A%(Ap) holds for some p Q (2,p). Assume in ad- 
dition that W(>C) has an empty interior. Then, there exists w^, s.t. {W(0„)}„>o 
converges almost surely to w^, and {0„}„>o converges to {9 G C;W{6) = w*}. 



The assumption on W(£) made in Theorem 4.3 is in common use to prove the 
convergence of EM based procedures or stochastic approximation algorithms. It 
is used in [36, to find the limit points of the classical EM algorithm. See also [T3] 
and PU] for the stability of the Monte Carlo EM algorithm and of a stochastic 
approximation of the EM algorithm. If W is sufficiently regular, Sard's theorem 
states that W(£) has Lebesgue measure and hence has an empty interior. 

Under the assumptions of Theorem |4.3| it can be proved that, along any 
converging P-BOEM sequence {0„}n>o to 6** in C, the averaged P-BOEM statis- 
tics {S„}„ defined by ([t]) (see also ^) converge to S{9^,), see Proposition 



5.2 



Since is continuous, the averaged P-BOEM sequence {0n}n>o converges to 
9{S{0^)) = R{e^). Since 0^ € C, R(6'*) = 0^, showing that the averaged P- 
BOEM algorithm has the same limit points as the P-BOEM algorithm. 

4.4 Rate of convergence of the Particle Block Online EM 
algorithms 

In this section, we consider a converging P-BOEM sequence {0n}n>o with lim- 
iting point 0^, e £. It can be shown, as in [24( Proposition 3.1], that the 
convergence of the sequence {0n}n>o is equivalent to the convergence of the suf- 
ficient statistics {S'„(0„, Y)}„>o: along any P-BOEM sequence converging to 
0*, this sequence of sufficient statistics converges to s^, = S{^^,). Let G : S S 
be the limiting EM map defined on the space of sufficient statistics by 

G(s) =^ S{0{s)) , Vs e 5 . (23) 
To that goal consider the following assumption. 

A9 (a) S and are twice continuously differentiable on O and S. 
(b) sp(VsG(Si,)) e (0, 1) where sp denotes the spectral radius. 

We will use the following notation: for any sequence of random variables 
{Zn}n>Q, write Zn = Olp(1) if limsup„ ||Z„||p < oo; and Z„ = Oa.s(l) if 
sup„ \Zn\ < +00 P — a.s. 



Theorem 4.4. Assume 45 

dcf , 



(p), and J^^for some p > 2. Define 

Ap 2pp/{p — p) and assume Al-(Ap) holds for some p e (2,p). Then, 

[0n - 0A llim. e,=e. = Ol,,, (^-^) Oa.s (1) ■ (24) 
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On the other hand, for the averaged sequence 



P/2 \ rpl/2 

-L n 



Oa.s (1) 



(25) 



The proof of Theorem 4.4 is obtained by checking the assumptions of 



Theore m 5. 1 and Theorem 5.2]. 



Eq. ( 24 ) shows that the error 0„ — 9^, has a Lp/2-iiorm decreasing as t„ 



1/2 



This result is obtained by assuming ~ t^, with d > {a+l)/2a, which impHes 
that the SMC error and the BOEM error are balanced. Unfortunately, such a 
rate is obtained after a total number of observations r„; therefore, as discussed 



in [53] , it is quite sub-optimal. Eq ( 25 1 shows that the rate of convergence equal 
to the square root of the total number of observations up to block n, can be 
reached by using the averaged P-BOEM algorithm: the Lp/2-iiorm of the error 

— ■ —1/2 

— 0* has a rate of convergence proportional to r„ . Here again, note that 
since Nn is chosen as in AlSlthe SMC error and the BOEM error are balanced. 



5 Proofs 



dcf 



For a function h, define osc(h) ~ sup^ ^, \h(z) — 



5.1 Proof of Proposition 4.1 



For any t £ {0, . . . , t„+i}, define the cr-algebra J^^t^^ by 

^n,r" ^ {^»' YT„+i:T„+t+i, (d, ; ^ e {1, . . . , 7V„+i}; < s < i} . (26) 

We use Ss[x^x') as a shorthand notation for S[x^x' s). Under A|2]and 
Propositions B.5., B.8. and B.9. in Appendix B of [52] can be applied so that 



E 



5„(0„,Y) ~ 5^;^^(^„, Y) < C(/i,„ + /2,„) 



(27) 



where 



T dcf -L \ ^ „ 

1 

T„+lA^^+l 
-r„+i 



^±^^5:pi-^iosc{5.+.„} 



^ dcf 



^_l_(Yt+T„ 
&-(Yt+T„ 



2p 



E 



T„ + l 



^ pl*-^l0Sc{5,+Tj 



pIv 



By the Holder inequality applied with a'= p/p > 1 and j3 ^ 1 — a ^ 



dcf 



/i,„ < 



1 



r„ + i 
'^n+l ^n+l t=l 



T„ + i 



Ep'*'^'osc{5,+t„} 



W+(Yt+T„) 



2p 



2pp/{p-p) 
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By A|2] A|3}(p), A|4}|a]) and J^{Ap), we have 



C 



^n+l^^n+1 

Using similar arguments for /2^„ yields /2_„ < C N^^^, which concludes the 
proof. 

5.2 Lp-controls 

Proposition 5.1. Let S* : x Y — > M'' be a measurable function s.t. J^(p) 
holds for some p > 2. Assume and ^^(/S.p) for some p G (2,p), 

where Ap = 2pp/{p — p). There exists a constant C s.t. for any n>l, 



<c 



Proof Under A[2] J^{p) and Aji] by US Theorem 4.1], there exists a constant 
Cs.t. 

5j" ,(0„,Y)-s(0,; - ^ 



< 



Moreover, under A[6]and A[7]-(Ap), by Proposition 4.1 we have 
which concludes the proof. 



\ n-\-l n-\-l , 



□ 



Proposition 5.2. Lei 5* : x Y — )■ M."^ be a measurable function s.t. J^(p) 
holds for somep > 2. Assume ^IMl ^(^P) for some p G {2,p), 

where Ap " 2pp/{p-p). Let {9n}n be the P-BOEM sequence. For any Oi, € Q, 
on the set {lim„ 0„ = 0^}, 

E„^S(0,), P-a.s. , 
where S is defined in (21 1 and E„ m ([t]). 



Proof. By ([T]), S„ can be written as 

-i- n. . ^ -L n. . ^ 



j=i 



By Theorem 4.2 S is continuous so, by the Cesaro Lemma, the second term in 



the right-hand side of (28l converges to S(6'*) P-a.s., on the set {lim„ 6'„ = 6^}. 



By Proposition |5.1| there exists a constant C such that for any n, 

1 1 



<S'„(6'„, Y) — S(6'„) 



< C 
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Hence, by A[5] A[8]and the Borel-Cantelli Lemma, 



(0„,Y)-S(0„) 



0, P-a.s. 



The proof is concluded by applying the Cesaro Lemma. □ 



A Detailed SMC algorithm 

In this section, we give a detailed description of the SMC algorithm used to 
compute sequentially the quantities S'„(0„,Y), n > 0. This is the algorithm 
proposed by |4] and flO] . 

At each time step, the weighted samples are produced using sequential im- 
portance sampling and sampling importance resampling steps. Li Algorithm [2] 
the instrumental proposition kernel used to select and propagate the particles 



is TTt (see ([T3| and (HHZIIIT] for further details on this SMC step). 

It is readily seen from the description below that the observations Yt are 
processed sequentially. 

Algorithm 2 Forward SMC step 
Require: ^ein, t„+i, N, YT„+i:T„+r,.+i ■ 
Ensure: S'„(0„,Y) . 

Sample {^g}^]^ i.i.d. with distribution x . 
Set ujfi = 1/N for all £ e {1, . . . , TV} . 
Set R^Q g^ = for alH e {1, . . . , TV} . 
for i = 1 to Tn+i do 
for ^ = 1 to do 

Conditionally to {On,YT„+i:T„+tAJt~-n^t-i}iLi)j sample independently 
(Jt^^t) ~ 7rt(i,da;) , where TTt{i,dx) oc ul_j^vt{£,l-i)qt{CLi^x)H(^x) . 



Set 



Set 



~ Tl: 1 



7 = 1 l^k=l^t-l^BA>it-l^<i. 



end for 
end for 

Set 



N 
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B L„-controls of SMC approximations 



In this section, we give further details on the Lp control on each block (see ( 27 1 ) : 

S'^ is defined by ([6| (we recall that, x being fixed, it is dropped from the 
notations) and 5^^^ is the SMC approximation of 5^ based on N particles 
computed as described in Section [2.4[ 

The following results are technical lemmas taken from |16| (stated here for 
a better clarity) or extensions of the Lp controls derived in fT^ . 

Hereafter, "time f ' corresponds to time t in the block n. Therefore, even if 
it is not explicit in the notations (in order to make them simpler) , the following 
quantities depend upon the observations Y7'^+i:7;^+,-^^j^ . 

Denote by 0^ the filtering distribution at time s, and let 

■P* J ■mg{u,x)(l)l[A.u) 

be the backward kernel smoothing kernel at time t + 1 . For all < s < t — 1 and 
for all bounded measurable function h on X'^^''+^, define recursively 
backward in time, according to 

€.r\M - j ■ ■ j ^le[Xs+lAXs)<PU^..r\Mx,+^..,)h{Xs..r) , (29) 

starting from (/>^.^|^ ~ (f>r- By convention, 0g — x- 

For t > 1, let { (^j , cjj)}^j^ be the weighted samples obtained as described in 
Section 2.4 (see also Algorithm l2] in Appendix [A| ; it approximates the filtering 



distribution (j)^. Denote by (p^ ' this approximation. For 0<s<r — 1, an 
approximation of the backward kernel can be obtained 

1=1 Z^^=i '^s'^elssj -^7 



and inserting this expression into ( 29 1 gives the following particle approximation 
of the fixed-interval smoothing distribution (/'q.^|^[/i] 



N N 



(30) 

with i^f ='Ef=iW^ 
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Lemma B.l. Let {(^|, wf), 1 < I < N,0 < t < Tn+i} be the weighted samples 
obtained by Algorithm^in Appendix^A\ with 6n, Tn+i, N, yTn+i:T„+T„+i ■ Then, 



= {^0■:rt^^\r^+^ [S-.+i] " '/'o"r„+i |r„+i [Sr„ + i]) , (31) 



where 



(32) 



For alH G {0, . . . , r} and all bounded measurable function h on X'^+^, define 
the kernel L^,^ : X*+i x X®''^^ ~> [0, 1] by 

Ll.^h{xo;t) '= Yl rn0{xu-i,Xi,)g0{xu,Yu+T)h{xo:r)H<ixt+i:r) ; (33) 
^ + 1 1 



ji=t+i 



by convention, L^^/i = h. Let /if^;^ and be two kernels on X x X'®i^+'^) 
defined for all G X by 

ClM^t)"^' J Bl,_^ixtAxt-i) ■ ■ ■Bl,{xudxo)l.lM^O:t) (34) 
Cffh{xt)'^= B^N.eixt,dxt-i) ■ ■ ■B^N,oixi,dxo)h^ ^h{xQ.t) . (35) 



Note that 



Cl,l{xt) = J me{xux')ge{x\YT+t+i) £f+i^,l(x')A(dx') . (36) 



Lemma B.2 Proposition B.3 Lemma B.4 and |B.5| can be found in [ 16] . 



Lemma B.2. Let {(^(,a;f), 1<^<-/V,0<i< t„-|_i} be the weighted samples 
obtained by Algorithm^in Appendix^A\ with 9n, t„+i, N, YT„+i:T„+r„+i • Then, 



(37) 



wii/i Cf^^^ is a kernel on X x defined, for all x G 'K and all bounded 

and measurable function h on X'^+^, by 



Gt,r Kx) = A,; Kx) 



(38) 
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Proof. By definition of Lj ^ , 



We write 

,N.e 



where we used the convention 

iN,0 



We have for all < t < r. 



Therefore, for all 1 < t < r, 

□ 

Proposition B.3. Let |(^|,a;|), 1<£<-/V, 0<i< t„+i} he the weighted sam- 
ples obtained by Algorithm^in Appendix^^ with input variables 0„,r„+i, N, 
YT„+i:T„+r„+i • Then, 



Tn+l 



t=0 



1 w 



T'n+l 



t=0 



where S^- is given by (32) and 



T^N,e„ /,\ del 









t-1 bt] 




■ ^ 1 " 







dcf 



^ = 1 l^t,T„ + il|oO 



(39) 





■ £f", 1 " 

t-l.x„^.i 







(40) 
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Proof. (37) can be rewritten as follows: 

t=0 t=0 

The proof is concluded by Lemma pB.l[ □ 
For any t £ {0, . . . , t„+i}, we recall the definition of J^^^ given by ( [26| 

J-,^, = YT„+i:T„+t+i, (ef,^f) ;^ € {1, . . . ,A^};0 < s < , 

where {(^^ , are the weighted samples obtained by Algorithm |2] in Ap- 
pendix [A| with input variables 0„,r„+i, N, YT„+i:T„+r„+i • 

Lemma B.4. Let {{^f,ujf), 1 < I < N,0 < t < Tn+i} be the weighted samples 
obtained by Algorithm^in Appendix^A\ with 0„, t^+i, N , 'YTn+i-.Tn+Tn+i ■ Then, 
for any 1 < t < r„+i and any I < £ < N , 



[/ mg^{-,x)ge„ix,YT^+t) h{x) \{Ax)\ 



4>f-\" [vt] 



(42) 



Proof. By definition of the weighted particles 



Met) 



N 

n,t-l 



N 



N 



^ ??T-e„ {Ct- 1 , x)ge^ {x, Yt+T„ ) 
^ / ujl_img^{Q-i,x)ge^ix,Yt+Tjh{x)X{dx 



h{x)X{dx) 



□ 



Lemma B.5. Assume and M Let l<^<iV,0<t< r„+i} &e 

</ie weighted samples obtained by Algorithm^ in Appendix^^ with input vari- 
ables 9n,Tn+l, N, YT„+l:T,.+r„+i- 

(i) For any t G {0, . . . , t„+i} and any measurable function h on X'^"+i+^, the 

N 



faj conditionally independent and identically distributed given J-^t-i j 
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(b) centered conditionally to 
(a) For any i G {0, . . . ,t„+i}; 



r„ + i 



s=l 



where St is defined by ( |32[ ). 
('iii^ For a// a; G X anrf anj/ t G {0, . . . , t„+i} 



< ^/-%sc{5(.,.,Y,+tJ} 



(43) 



0-+ 



£^1, l(x) cr2 



Proof. The proof of is given by [16^, Lemma 3] . 

Proof of ([n]). Let Hs-i.s.t be the operator which associates to any bounded 
and measurable function ft, on X x X the function Ils-i;s,Th given, for any 
(a^o, ■..,Xr) e X^+\ by 

Us-I-.s.tHxo-.t) =^ h{Xs-^l:s) ■ 

Using this notation, we may write S,- — ^s-1:s.tS{-, •, Ys+j-) and G^fSr = 

G^f^s-i:s.rS{-^ •, Ys+t)- Following the same lines as in [16, Lemma 10], 

|G^y;'n,_i.,^,5(-,-,Y,+T)|oo <p'~'-W(^(-,-,y,+t))K,i|oo if t<s-l, 
|Gf;'n,_i.,^,5(-,-,Y,+T)|oo <P*"^osc(5(-,-,Y,+t))K,1U if t>s. 

Consequently, 



s=l 



< (^pl*-^l0Sc{5(v,Y,+T)} ) K.lloo , 



which shows (uil 



Proof of (pli|. By the definition (34), for all x G X and alH G {1, . . . , r}. 



Cf,.'^{x) = j me{x,xt+i)ge{xt+i,Yt+T+i) 

r 

X me(x„_i,da;„).ge(a;„, Y„+T)A(da;t+i:^) 



M=t+2 
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Hence, by A[2] 

> cr- y g0(xt+i, Yf+T+i)£^+i,^l(xt+i)A(dxt+i) 



which concludes the proof of the first statement. By (36), i^and (16 1, 

^2 



mg{x,x')gg{x' ,Yt+T) .1:1 [. ^ X{Ax') > —b^(Yt+T) 



□ 



The proofs of Propositions B.6 and B.7 follow the same lines as [T9', Propo- 
sitions 1-2]. The upper bounds given here provide an explicit dependence on 
the observations. 

Proposition B.6. Assume J^andJ^ Let {{^f,ujf), I < i < N,0 <t < r„+i} 
be the weighted samples obtained by Algorithm [1| in Appendix with input 
variables 0„,t„-)_i, N, YT„+i:T„+r„+i • For all p > 1, there exists a constant C 
such that 



E 



r„+i 



< c 



(f-l)VO r„+i 
'n+1 



7VP-(fVl) 



t=0 



^/-^Iosc{5(.,.,Y,+tJ} 



(44) 



where D^f is defined in (39 I. 
Proof. By Lemma B.5[[iii ) , 







9t-i 





< 



a2 6_(Yt+Tj ■ 



By Lemma |B.5| [i|) and since 6'„ is J^,^(-measurable for all t € {0, . . . , t„_|-i}, 

t}o<t<^ is a martingale difference. Since p > 1, Burkholder's 

inequality (see Theorem 2.10, page 23]) states the existence of a constant 
C depending only on p such that: 



T„ + l 



P/2l 



E 



< CE 
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Hence, 



E 



r„+i 



t=0 



< C 



( (T+\v\o oY 



X E 



r„ + i 

E 

t=0 



N 



p/2 



which impHes, using the convexity inequaHty (X]fe=i Oukf^"^ ^ t^p/'^ J2k=i '^fe^^' 



E 



t=0 



< C 



(r„+i + l)^^ _^E 



t=o 



— y 



Since Yt+T^ and 0„ are (.^-measurable, 



E 



N ^NM„ c I'tf^i 



E 



E 



N ^N,e„ c (cl\ 



•^n,t-l 



6_(Yt+TjP 



By Lemma B.5 i|, using again the Burkholder and convexity inequahties, there 
exists C s.t. 



E 



N ^A',e„ c 

1=1 l'^t,T„_|_i lloO 



-pN 



< CA^(?-i)^"E 



■ AT 

E 



1 Gt,r„" iSt„+i (Ct ) 



-pN 



□ 



The proof is concluded by ( 43 1 . 

Proposition B. 7. ylssiime ^ and y^jfij Let {(^f , w^^), 1 < € < TV, < t < r„+i} 
he the weighted samples obtained by Algorithm ^ in Appendix ^ with input 
variables 0„,t„4_i, N, Yt„+1:T„+t„+i • For all p > 1 and aZ/ p G (liP)? i/iere 
exists a constant C s.t. for any t G {0, . . . , t^+i}, 



E 



X E 



'2;V|*-^Iosc{5(,,Y,+tJ} 



E 



s=l 



■pN 



vlv 



, (45) 
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where is defined in (40 I and a p/p and (3 ^ = 1 — a ^ 



Proof. Lemma B.4 applied with the function h = /Ij'^^^^l and (36 1 yield for 
any 1 < ^ < iV 



E 



■N 

n,t-l 



cy^ 1 



Therefore, by definition of C^f (see (40 1), C"^"" (S^-^^j) is equal to 



+ 1 v-T„ + i 





AN 




AN 




( 



b5 =(e[<,|^„Vi]-A 



N \ 
n,t 



= (E [<,|^„Vi]-<0 



(K \CiN \-pN T. _C)N \ 



E[^^,|j-„Vi] <*E [17^,1 



1 



with 



j-,Ar dcf 1 f '-'t,r„+iSr„ + i(Ct, 



n.t 



1 ^ 

iV ^ 

This can be rewritten 
with 

Ci = i35 (E [<,|^„Vi]-< 
and 



C2 ^ (E [<,|-^„^,_i] - A^,) (E - nl,) 

B^, 1 
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By Lemmas B.4 and Lemmas B.5 iii), and iro 



1 . Cr-\v\oo ^t^n . /■ 0-+ \ \V\ 



and by Lemma B.5 



< 



E[<*|-?-„Vi] <t"V^-; ^-&-(Y*+T„; 

Therefore, there exists a constant C s.t. 



5]pl*-^losc(5(-,-,Y,+T„ 



E[|Cir|j-,^,_i]<C 



2p 



E 



dof 



Applying the Holder inequality with a"=^ p/p > 1 and /3 ^ 1 — a ^ yields 



E 



N \P 



\^n,t\ |E [^^t|-7^i^4_i] — ^i^t 

1/ 



n,t-l 



< E 



\B. 



N |"P 
n.t 



-pN 



E 



|E[<,|^„Vi] -A 



AT 



•^r, i- 



By Proposition B.6 



E 



Lb, 



N 



1/a 



1 G't_;„" jSr„_|_i (Ct ) 



J", 



N 

n.t-1 



1/a 



< CAf(5'^i)-Pa;+(Yt+TjPE 



^pl*-^losc{5(.,.,Y,+Tj} 



-^ni- 



l/a 



Given J^nt-ii the random variables < E 



are conditionally independent, centered and bounded by Lemma |B.5[ Following 
the same steps as in the proof of Proposition [RIG) there exists a constant C such 
that 



E 

Hence, 

E[|Cin <c7v(5vi)+(fvi)-2p 

uj^{Yt+T, 



■N 



1//9 



X E 



2p 



E 



5]pl*-^losc{5(,,Y,+Tj} 



n.t-1 



P/P' 
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Similarly, using 



E 



l/a 



yields 

E[|C2r] < 



X E 



E 



^pl*-%sc{5(-,-,Y,+rJ} 



-pN 
n,t-l 



□ 
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